Overview

Brought to you by YData

Dataset statistics

Number of variables18
Number of observations2820232
Missing cells546560
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 GiB
Average record size in memory495.3 B

Variable types

Text2
Numeric11
Boolean1
Categorical4

Alerts

POSSIBLENterm has constant value "True" Constant
Insidesource has constant value "TMHMM2.0" Constant
TMhelixsource has constant value "TMHMM2.0" Constant
Outsidesource has constant value "TMHMM2.0" Constant
ExpnumberofAAsinTMHs is highly overall correlated with Insideend and 5 other fieldsHigh correlation
Insideend is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
Insidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
Length is highly overall correlated with Insideend and 1 other fieldsHigh correlation
Outsideend is highly overall correlated with Length and 3 other fieldsHigh correlation
Outsidestart is highly overall correlated with ExpnumberofAAsinTMHs and 4 other fieldsHigh correlation
PredictedTMHsNumber is highly overall correlated with ExpnumberofAAsinTMHs and 5 other fieldsHigh correlation
TMhelixend is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
TMhelixstart is highly overall correlated with ExpnumberofAAsinTMHs and 6 other fieldsHigh correlation
POSSIBLENterm has 546560 (19.4%) missing values Missing
Protein_ID has unique values Unique
Expnumberfirst60AAs has 143860 (5.1%) zeros Zeros

Reproduction

Analysis started2025-07-10 08:39:03.908468
Analysis finished2025-07-10 08:41:22.796491
Duration2 minutes and 18.89 seconds
Software versionydata-profiling v4.16.1
Download configurationconfig.json

Variables

Distinct589382
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size195.6 MiB
2025-07-10T10:41:23.200198image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length88
Median length87
Mean length23.728165
Min length5

Characters and Unicode

Total characters66918929
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107527 ?
Unique (%)3.8%

Sample

1st rowNC_001330.1
2nd rowNC_001331.1
3rd rowNC_001331.1
4th rowNC_001331.1
5th rowNC_001331.1
ValueCountFrequency (%)
samn01774283_a1_ct717 118
 
< 0.1%
uvig_134152 104
 
< 0.1%
samn01773488_b1_ct3 101
 
< 0.1%
uvig_20542 97
 
< 0.1%
mgv-genome-0380253 96
 
< 0.1%
mgv-genome-0380244 95
 
< 0.1%
uvig_544214 91
 
< 0.1%
mgv-genome-0380194 88
 
< 0.1%
mgv-genome-0380160 88
 
< 0.1%
mgv-genome-0380240 87
 
< 0.1%
Other values (589372) 2819267
> 99.9%
2025-07-10T10:41:23.702589image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 6419020
 
9.6%
1 3451421
 
5.2%
0 2972563
 
4.4%
3 2820088
 
4.2%
2 2793748
 
4.2%
E 2333405
 
3.5%
4 2286551
 
3.4%
5 2254271
 
3.4%
M 2126961
 
3.2%
7 2090878
 
3.1%
Other values (55) 37370023
55.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 66918929
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 6419020
 
9.6%
1 3451421
 
5.2%
0 2972563
 
4.4%
3 2820088
 
4.2%
2 2793748
 
4.2%
E 2333405
 
3.5%
4 2286551
 
3.4%
5 2254271
 
3.4%
M 2126961
 
3.2%
7 2090878
 
3.1%
Other values (55) 37370023
55.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 66918929
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 6419020
 
9.6%
1 3451421
 
5.2%
0 2972563
 
4.4%
3 2820088
 
4.2%
2 2793748
 
4.2%
E 2333405
 
3.5%
4 2286551
 
3.4%
5 2254271
 
3.4%
M 2126961
 
3.2%
7 2090878
 
3.1%
Other values (55) 37370023
55.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 66918929
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 6419020
 
9.6%
1 3451421
 
5.2%
0 2972563
 
4.4%
3 2820088
 
4.2%
2 2793748
 
4.2%
E 2333405
 
3.5%
4 2286551
 
3.4%
5 2254271
 
3.4%
M 2126961
 
3.2%
7 2090878
 
3.1%
Other values (55) 37370023
55.8%

Protein_ID
Text

Unique 

Distinct2820232
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size203.2 MiB
2025-07-10T10:41:26.239469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length91
Median length89
Mean length26.540974
Min length7

Characters and Unicode

Total characters74851703
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2820232 ?
Unique (%)100.0%

Sample

1st rowNP_039595.1
2nd rowNP_039601.1
3rd rowNP_039602.1
4th rowNP_039603.1
5th rowNP_039604.1
ValueCountFrequency (%)
np_039699.1 1
 
< 0.1%
biochar_6180_5 1
 
< 0.1%
np_039595.1 1
 
< 0.1%
np_039601.1 1
 
< 0.1%
np_039602.1 1
 
< 0.1%
np_039603.1 1
 
< 0.1%
np_039604.1 1
 
< 0.1%
np_039606.1 1
 
< 0.1%
biochar_6126_21 1
 
< 0.1%
biochar_6133_4 1
 
< 0.1%
Other values (2820222) 2820222
> 99.9%
2025-07-10T10:41:29.228316image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 9172554
 
12.3%
1 4398427
 
5.9%
2 3533107
 
4.7%
3 3474835
 
4.6%
0 3294552
 
4.4%
4 2842511
 
3.8%
5 2735874
 
3.7%
7 2458803
 
3.3%
6 2447491
 
3.3%
E 2335532
 
3.1%
Other values (55) 38158017
51.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 74851703
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
_ 9172554
 
12.3%
1 4398427
 
5.9%
2 3533107
 
4.7%
3 3474835
 
4.6%
0 3294552
 
4.4%
4 2842511
 
3.8%
5 2735874
 
3.7%
7 2458803
 
3.3%
6 2447491
 
3.3%
E 2335532
 
3.1%
Other values (55) 38158017
51.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 74851703
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
_ 9172554
 
12.3%
1 4398427
 
5.9%
2 3533107
 
4.7%
3 3474835
 
4.6%
0 3294552
 
4.4%
4 2842511
 
3.8%
5 2735874
 
3.7%
7 2458803
 
3.3%
6 2447491
 
3.3%
E 2335532
 
3.1%
Other values (55) 38158017
51.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 74851703
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
_ 9172554
 
12.3%
1 4398427
 
5.9%
2 3533107
 
4.7%
3 3474835
 
4.6%
0 3294552
 
4.4%
4 2842511
 
3.8%
5 2735874
 
3.7%
7 2458803
 
3.3%
6 2447491
 
3.3%
E 2335532
 
3.1%
Other values (55) 38158017
51.0%

Length
Real number (ℝ)

High correlation 

Distinct3523
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.91134
Minimum18
Maximum13719
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:29.315549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile47
Q181
median129
Q3218
95-th percentile788
Maximum13719
Range13701
Interquartile range (IQR)137

Descriptive statistics

Standard deviation291.51083
Coefficient of variation (CV)1.3195829
Kurtosis46.660059
Mean220.91134
Median Absolute Deviation (MAD)58
Skewness4.8169991
Sum6.2302122 × 108
Variance84978.562
MonotonicityNot monotonic
2025-07-10T10:41:29.404164image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
71 21415
 
0.8%
66 20631
 
0.7%
68 20549
 
0.7%
60 20340
 
0.7%
67 19434
 
0.7%
70 19106
 
0.7%
55 18774
 
0.7%
77 18237
 
0.6%
93 18037
 
0.6%
99 17882
 
0.6%
Other values (3513) 2625827
93.1%
ValueCountFrequency (%)
18 1
 
< 0.1%
20 27
 
< 0.1%
21 35
 
< 0.1%
22 52
 
< 0.1%
23 79
 
< 0.1%
24 137
 
< 0.1%
25 166
 
< 0.1%
26 202
< 0.1%
27 306
< 0.1%
28 458
< 0.1%
ValueCountFrequency (%)
13719 1
< 0.1%
13380 1
< 0.1%
9455 1
< 0.1%
9097 1
< 0.1%
8731 1
< 0.1%
8300 1
< 0.1%
7972 1
< 0.1%
7748 1
< 0.1%
7700 2
< 0.1%
7699 2
< 0.1%

PredictedTMHsNumber
Real number (ℝ)

High correlation 

Distinct34
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8934964
Minimum1
Maximum48
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:29.487106image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile5
Maximum48
Range47
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.8690251
Coefficient of variation (CV)0.98707613
Kurtosis27.54299
Mean1.8934964
Median Absolute Deviation (MAD)0
Skewness4.4129981
Sum5340099
Variance3.4932546
MonotonicityNot monotonic
2025-07-10T10:41:29.566695image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
1 1670464
59.2%
2 676617
24.0%
3 209168
 
7.4%
4 108505
 
3.8%
5 39806
 
1.4%
6 33762
 
1.2%
10 15570
 
0.6%
7 14218
 
0.5%
8 12937
 
0.5%
12 9254
 
0.3%
Other values (24) 29931
 
1.1%
ValueCountFrequency (%)
1 1670464
59.2%
2 676617
24.0%
3 209168
 
7.4%
4 108505
 
3.8%
5 39806
 
1.4%
6 33762
 
1.2%
7 14218
 
0.5%
8 12937
 
0.5%
9 7808
 
0.3%
10 15570
 
0.6%
ValueCountFrequency (%)
48 1
 
< 0.1%
36 2
 
< 0.1%
34 2
 
< 0.1%
32 16
 
< 0.1%
30 13
 
< 0.1%
29 6
 
< 0.1%
28 30
 
< 0.1%
27 6
 
< 0.1%
26 103
< 0.1%
25 13
 
< 0.1%

ExpnumberofAAsinTMHs
Real number (ℝ)

High correlation 

Distinct983818
Distinct (%)34.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.850682
Minimum6.44741
Maximum1029.8286
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:29.652485image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum6.44741
5-th percentile17.44989
Q120.85793
median23.14312
Q344.29862
95-th percentile112.84403
Maximum1029.8286
Range1023.3812
Interquartile range (IQR)23.44069

Descriptive statistics

Standard deviation43.234392
Coefficient of variation (CV)1.033063
Kurtosis27.438665
Mean41.850682
Median Absolute Deviation (MAD)5.379675
Skewness4.3714055
Sum1.1802863 × 108
Variance1869.2127
MonotonicityNot monotonic
2025-07-10T10:41:29.742197image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.23661 3989
 
0.1%
24.87583 3985
 
0.1%
36.04048 2768
 
0.1%
210.43458 2391
 
0.1%
37.75642 2117
 
0.1%
47.86547 1952
 
0.1%
108.33627 1755
 
0.1%
22.0628 1703
 
0.1%
22.05877 1691
 
0.1%
20.65344 1463
 
0.1%
Other values (983808) 2796418
99.2%
ValueCountFrequency (%)
6.44741 2
< 0.1%
6.55333 3
< 0.1%
6.71723 1
 
< 0.1%
6.76893 1
 
< 0.1%
6.83648 1
 
< 0.1%
6.87749 1
 
< 0.1%
7.01696 1
 
< 0.1%
7.10047 1
 
< 0.1%
7.16215 1
 
< 0.1%
7.16231 1
 
< 0.1%
ValueCountFrequency (%)
1029.82865 1
< 0.1%
793.04047 1
< 0.1%
782.99323 1
< 0.1%
777.33773 1
< 0.1%
775.5699 1
< 0.1%
750.08628 1
< 0.1%
740.20257 1
< 0.1%
740.12692 1
< 0.1%
739.44781 1
< 0.1%
725.55356 1
< 0.1%

Expnumberfirst60AAs
Real number (ℝ)

Zeros 

Distinct786820
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.514973
Minimum0
Maximum52.88279
Zeros143860
Zeros (%)5.1%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:29.831027image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q116.436062
median21.199435
Q325.097315
95-th percentile41.50342
Maximum52.88279
Range52.88279
Interquartile range (IQR)8.6612525

Descriptive statistics

Standard deviation12.212541
Coefficient of variation (CV)0.5952989
Kurtosis-0.45859958
Mean20.514973
Median Absolute Deviation (MAD)4.443205
Skewness-0.10037227
Sum57856984
Variance149.14616
MonotonicityNot monotonic
2025-07-10T10:41:29.914207image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 143860
 
5.1%
0.00018 5640
 
0.2%
18.23661 3989
 
0.1%
24.87583 3985
 
0.1%
42.15085 3664
 
0.1%
0.00019 3210
 
0.1%
0.0002 3166
 
0.1%
36.04048 2768
 
0.1%
1 × 10-52592
 
0.1%
0.00017 2284
 
0.1%
Other values (786810) 2645074
93.8%
ValueCountFrequency (%)
0 143860
5.1%
1 × 10-52592
 
0.1%
2 × 10-51341
 
< 0.1%
3 × 10-51343
 
< 0.1%
4 × 10-5900
 
< 0.1%
5 × 10-5768
 
< 0.1%
6 × 10-5927
 
< 0.1%
7 × 10-5794
 
< 0.1%
8 × 10-51161
 
< 0.1%
9 × 10-51054
 
< 0.1%
ValueCountFrequency (%)
52.88279 2
 
< 0.1%
52.56452 1
 
< 0.1%
52.4412 6
< 0.1%
52.39016 1
 
< 0.1%
52.34317 1
 
< 0.1%
52.10504 1
 
< 0.1%
51.57802 1
 
< 0.1%
51.56347 1
 
< 0.1%
51.33831 1
 
< 0.1%
51.33708 2
 
< 0.1%

TotalprobofNin
Real number (ℝ)

Distinct99923
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.58968644
Minimum0
Maximum1
Zeros157
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:29.997018image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.02219
Q10.23569
median0.69128
Q30.92597
95-th percentile0.99604
Maximum1
Range1
Interquartile range (IQR)0.69028

Descriptive statistics

Standard deviation0.35280531
Coefficient of variation (CV)0.59829306
Kurtosis-1.4057218
Mean0.58968644
Median Absolute Deviation (MAD)0.27981
Skewness-0.37835311
Sum1663052.6
Variance0.12447158
MonotonicityNot monotonic
2025-07-10T10:41:30.088109image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.59265 3993
 
0.1%
0.56701 3988
 
0.1%
0.99854 3981
 
0.1%
0.03881 3512
 
0.1%
0.95017 3014
 
0.1%
0.86194 2792
 
0.1%
0.28216 1964
 
0.1%
0.99957 1898
 
0.1%
0.99602 1846
 
0.1%
0.99959 1839
 
0.1%
Other values (99913) 2791405
99.0%
ValueCountFrequency (%)
0 157
 
< 0.1%
1 × 10-5286
< 0.1%
2 × 10-5318
< 0.1%
3 × 10-5252
< 0.1%
4 × 10-5389
< 0.1%
5 × 10-5238
 
< 0.1%
6 × 10-5625
< 0.1%
7 × 10-5176
 
< 0.1%
8 × 10-5155
 
< 0.1%
9 × 10-595
 
< 0.1%
ValueCountFrequency (%)
1 591
< 0.1%
0.99999 987
< 0.1%
0.99998 945
< 0.1%
0.99997 948
< 0.1%
0.99996 860
< 0.1%
0.99995 1134
< 0.1%
0.99994 634
< 0.1%
0.99993 726
< 0.1%
0.99992 669
< 0.1%
0.99991 621
< 0.1%

POSSIBLENterm
Boolean

Constant  Missing 

Distinct1
Distinct (%)< 0.1%
Missing546560
Missing (%)19.4%
Memory size94.7 MiB
True
2273672 
(Missing)
546560 
ValueCountFrequency (%)
True 2273672
80.6%
(Missing) 546560
 
19.4%
2025-07-10T10:41:30.158963image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Insidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.3 MiB
TMHMM2.0
2820232 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters22561856
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 2820232
100.0%

Length

2025-07-10T10:41:30.217583image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-10T10:41:30.273061image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 2820232
100.0%

Most occurring characters

ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Insidestart
Real number (ℝ)

High correlation 

Distinct2649
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.334258
Minimum1
Maximum13249
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:30.341344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median33
Q386
95-th percentile465
Maximum13249
Range13248
Interquartile range (IQR)85

Descriptive statistics

Standard deviation195.32107
Coefficient of variation (CV)2.092705
Kurtosis133.92739
Mean93.334258
Median Absolute Deviation (MAD)32
Skewness7.1304808
Sum2.6322426 × 108
Variance38150.32
MonotonicityNot monotonic
2025-07-10T10:41:30.427422image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 935612
33.2%
27 101890
 
3.6%
28 92965
 
3.3%
33 74995
 
2.7%
38 62908
 
2.2%
24 58822
 
2.1%
22 46873
 
1.7%
25 37458
 
1.3%
43 34044
 
1.2%
23 26591
 
0.9%
Other values (2639) 1348074
47.8%
ValueCountFrequency (%)
1 935612
33.2%
19 1105
 
< 0.1%
20 1683
 
0.1%
21 892
 
< 0.1%
22 46873
 
1.7%
23 26591
 
0.9%
24 58822
 
2.1%
25 37458
 
1.3%
26 9835
 
0.3%
27 101890
 
3.6%
ValueCountFrequency (%)
13249 1
 
< 0.1%
7965 1
 
< 0.1%
7680 2
 
< 0.1%
7679 2
 
< 0.1%
7675 4
 
< 0.1%
7674 13
< 0.1%
7673 6
< 0.1%
7672 2
 
< 0.1%
7661 1
 
< 0.1%
7561 1
 
< 0.1%

Insideend
Real number (ℝ)

High correlation 

Distinct2719
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139.64195
Minimum1
Maximum13380
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:30.514432image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q136
median86
Q3152
95-th percentile523
Maximum13380
Range13379
Interquartile range (IQR)116

Descriptive statistics

Standard deviation208.04787
Coefficient of variation (CV)1.4898665
Kurtosis106.17795
Mean139.64195
Median Absolute Deviation (MAD)56
Skewness6.2603977
Sum3.9382271 × 108
Variance43283.918
MonotonicityNot monotonic
2025-07-10T10:41:30.605879image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 217509
 
7.7%
4 76991
 
2.7%
12 75715
 
2.7%
11 51710
 
1.8%
20 48564
 
1.7%
19 29770
 
1.1%
8 24743
 
0.9%
67 21399
 
0.8%
1 20753
 
0.7%
55 20245
 
0.7%
Other values (2709) 2232833
79.2%
ValueCountFrequency (%)
1 20753
 
0.7%
2 2875
 
0.1%
4 76991
 
2.7%
6 217509
7.7%
8 24743
 
0.9%
10 1894
 
0.1%
11 51710
 
1.8%
12 75715
 
2.7%
15 3668
 
0.1%
16 9678
 
0.3%
ValueCountFrequency (%)
13380 1
 
< 0.1%
7972 1
 
< 0.1%
7700 2
 
< 0.1%
7699 2
 
< 0.1%
7695 4
 
< 0.1%
7694 13
< 0.1%
7693 6
< 0.1%
7692 2
 
< 0.1%
7681 1
 
< 0.1%
7570 1
 
< 0.1%

TMhelixsource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.3 MiB
TMHMM2.0
2820232 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters22561856
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 2820232
100.0%

Length

2025-07-10T10:41:30.689101image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-10T10:41:30.756537image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 2820232
100.0%

Most occurring characters

ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

TMhelixstart
Real number (ℝ)

High correlation 

Distinct2653
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.298779
Minimum2
Maximum13226
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:30.821305image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q110
median36
Q390
95-th percentile471
Maximum13226
Range13224
Interquartile range (IQR)80

Descriptive statistics

Standard deviation196.28246
Coefficient of variation (CV)2.0173168
Kurtosis129.90122
Mean97.298779
Median Absolute Deviation (MAD)29
Skewness7.0253554
Sum2.7440513 × 108
Variance38526.804
MonotonicityNot monotonic
2025-07-10T10:41:30.908032image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 218079
 
7.7%
5 197680
 
7.0%
4 188617
 
6.7%
10 98390
 
3.5%
13 80756
 
2.9%
15 67965
 
2.4%
20 59910
 
2.1%
12 51934
 
1.8%
21 48571
 
1.7%
39 34572
 
1.2%
Other values (2643) 1773758
62.9%
ValueCountFrequency (%)
2 20753
 
0.7%
3 2875
 
0.1%
4 188617
6.7%
5 197680
7.0%
6 15436
 
0.5%
7 218079
7.7%
9 24805
 
0.9%
10 98390
3.5%
11 5227
 
0.2%
12 51934
 
1.8%
ValueCountFrequency (%)
13226 1
 
< 0.1%
7943 1
 
< 0.1%
7657 2
 
< 0.1%
7656 2
 
< 0.1%
7652 4
 
< 0.1%
7651 13
< 0.1%
7650 6
< 0.1%
7649 2
 
< 0.1%
7638 1
 
< 0.1%
7538 1
 
< 0.1%

TMhelixend
Real number (ℝ)

High correlation 

Distinct2673
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean118.14131
Minimum16
Maximum13248
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:30.995629image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile24
Q131
median57
Q3112
95-th percentile491
Maximum13248
Range13232
Interquartile range (IQR)81

Descriptive statistics

Standard deviation196.51164
Coefficient of variation (CV)1.663361
Kurtosis129.39934
Mean118.14131
Median Absolute Deviation (MAD)30
Skewness7.0105449
Sum3.3318589 × 108
Variance38616.826
MonotonicityNot monotonic
2025-07-10T10:41:31.083437image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29 155069
 
5.5%
26 136386
 
4.8%
27 133669
 
4.7%
24 91901
 
3.3%
32 74386
 
2.6%
35 60142
 
2.1%
23 52527
 
1.9%
37 51684
 
1.8%
42 47586
 
1.7%
34 46858
 
1.7%
Other values (2663) 1970024
69.9%
ValueCountFrequency (%)
16 6
 
< 0.1%
17 24
 
< 0.1%
18 797
 
< 0.1%
19 3976
 
0.1%
20 2123
 
0.1%
21 39480
1.4%
22 34255
 
1.2%
23 52527
1.9%
24 91901
3.3%
25 19524
 
0.7%
ValueCountFrequency (%)
13248 1
 
< 0.1%
7964 1
 
< 0.1%
7679 2
 
< 0.1%
7678 2
 
< 0.1%
7674 4
 
< 0.1%
7673 13
< 0.1%
7672 6
< 0.1%
7671 2
 
< 0.1%
7660 1
 
< 0.1%
7560 1
 
< 0.1%

Outsidesource
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size153.3 MiB
TMHMM2.0
2820232 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters22561856
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTMHMM2.0
2nd rowTMHMM2.0
3rd rowTMHMM2.0
4th rowTMHMM2.0
5th rowTMHMM2.0

Common Values

ValueCountFrequency (%)
TMHMM2.0 2820232
100.0%

Length

2025-07-10T10:41:31.160658image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-10T10:41:31.218192image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
tmhmm2.0 2820232
100.0%

Most occurring characters

ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 22561856
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 8460696
37.5%
T 2820232
 
12.5%
H 2820232
 
12.5%
2 2820232
 
12.5%
. 2820232
 
12.5%
0 2820232
 
12.5%

Outsidestart
Real number (ℝ)

High correlation 

Distinct2229
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean89.368286
Minimum1
Maximum6719
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:31.283730image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median35
Q385
95-th percentile425
Maximum6719
Range6718
Interquartile range (IQR)84

Descriptive statistics

Standard deviation171.95096
Coefficient of variation (CV)1.9240713
Kurtosis38.949965
Mean89.368286
Median Absolute Deviation (MAD)34
Skewness4.8009472
Sum2.520393 × 108
Variance29567.132
MonotonicityNot monotonic
2025-07-10T10:41:31.370199image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 734852
26.1%
30 190705
 
6.8%
25 99846
 
3.5%
36 94404
 
3.3%
28 77838
 
2.8%
27 75507
 
2.7%
44 60261
 
2.1%
35 57451
 
2.0%
32 44830
 
1.6%
43 34560
 
1.2%
Other values (2219) 1349978
47.9%
ValueCountFrequency (%)
1 734852
26.1%
17 300
 
< 0.1%
18 128
 
< 0.1%
19 382
 
< 0.1%
20 9969
 
0.4%
21 4618
 
0.2%
22 10260
 
0.4%
23 27778
 
1.0%
24 6487
 
0.2%
25 99846
 
3.5%
ValueCountFrequency (%)
6719 1
 
< 0.1%
6658 1
 
< 0.1%
5856 3
< 0.1%
5656 1
 
< 0.1%
5310 1
 
< 0.1%
5168 1
 
< 0.1%
5088 1
 
< 0.1%
4933 2
< 0.1%
4872 1
 
< 0.1%
4850 1
 
< 0.1%

Outsideend
Real number (ℝ)

High correlation 

Distinct3497
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean177.56816
Minimum3
Maximum13719
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.5 MiB
2025-07-10T10:41:31.457170image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q133
median81
Q3185
95-th percentile732
Maximum13719
Range13716
Interquartile range (IQR)152

Descriptive statistics

Standard deviation295.40886
Coefficient of variation (CV)1.6636364
Kurtosis44.989715
Mean177.56816
Median Absolute Deviation (MAD)62
Skewness4.7425366
Sum5.0078341 × 108
Variance87266.397
MonotonicityNot monotonic
2025-07-10T10:41:31.543268image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 188617
 
6.7%
4 120689
 
4.3%
9 98390
 
3.5%
14 67965
 
2.4%
38 30526
 
1.1%
19 30140
 
1.1%
33 24236
 
0.9%
30 22163
 
0.8%
28 21021
 
0.7%
32 20753
 
0.7%
Other values (3487) 2195732
77.9%
ValueCountFrequency (%)
3 188617
6.7%
4 120689
4.3%
5 15436
 
0.5%
6 570
 
< 0.1%
8 62
 
< 0.1%
9 98390
3.5%
10 3333
 
0.1%
11 224
 
< 0.1%
12 5041
 
0.2%
14 67965
 
2.4%
ValueCountFrequency (%)
13719 1
< 0.1%
13225 1
< 0.1%
9455 1
< 0.1%
9097 1
< 0.1%
8731 1
< 0.1%
8300 1
< 0.1%
7942 1
< 0.1%
7748 1
< 0.1%
7656 2
< 0.1%
7655 2
< 0.1%

Phage_source
Categorical

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.1 MiB
MGV
830363 
GPD
741785 
TemPhD
437596 
GOV2
384232 
CHVD
198934 
Other values (8)
227322 

Length

Max length8
Median length3
Mean length3.8176186
Min length3

Characters and Unicode

Total characters10766570
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
MGV 830363
29.4%
GPD 741785
26.3%
TemPhD 437596
15.5%
GOV2 384232
13.6%
CHVD 198934
 
7.1%
GVD 80967
 
2.9%
RefSeq 43567
 
1.5%
IGVD 33306
 
1.2%
PhagesDB 32227
 
1.1%
Genbank 20549
 
0.7%
Other values (3) 16706
 
0.6%

Length

2025-07-10T10:41:31.628253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mgv 830363
29.4%
gpd 741785
26.3%
temphd 437596
15.5%
gov2 384232
13.6%
chvd 198934
 
7.1%
gvd 80967
 
2.9%
refseq 43567
 
1.5%
igvd 33306
 
1.2%
phagesdb 32227
 
1.1%
genbank 20549
 
0.7%
Other values (3) 16706
 
0.6%

Most occurring characters

ValueCountFrequency (%)
G 2091202
19.4%
V 1541926
14.3%
D 1527933
14.2%
P 1211608
11.3%
M 831386
 
7.7%
e 577506
 
5.4%
h 469823
 
4.4%
T 451720
 
4.2%
m 437596
 
4.1%
O 384232
 
3.6%
Other values (18) 1241638
11.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 10766570
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 2091202
19.4%
V 1541926
14.3%
D 1527933
14.2%
P 1211608
11.3%
M 831386
 
7.7%
e 577506
 
5.4%
h 469823
 
4.4%
T 451720
 
4.2%
m 437596
 
4.1%
O 384232
 
3.6%
Other values (18) 1241638
11.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 10766570
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 2091202
19.4%
V 1541926
14.3%
D 1527933
14.2%
P 1211608
11.3%
M 831386
 
7.7%
e 577506
 
5.4%
h 469823
 
4.4%
T 451720
 
4.2%
m 437596
 
4.1%
O 384232
 
3.6%
Other values (18) 1241638
11.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 10766570
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 2091202
19.4%
V 1541926
14.3%
D 1527933
14.2%
P 1211608
11.3%
M 831386
 
7.7%
e 577506
 
5.4%
h 469823
 
4.4%
T 451720
 
4.2%
m 437596
 
4.1%
O 384232
 
3.6%
Other values (18) 1241638
11.5%

Interactions

2025-07-10T10:41:11.496334image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:47.828641image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:50.036932image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:52.299644image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:55.267859image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:57.644318image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:00.045735image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:02.401835image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:04.668011image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:06.860628image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:09.132576image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:11.707731image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:48.036046image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:50.233660image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:52.508962image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:55.475350image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:57.858496image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:00.247000image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:02.610190image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:04.861793image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:07.065962image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:09.331369image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:11.960950image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:48.233561image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:50.431165image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:52.703732image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:55.679091image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:58.060636image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:00.467083image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:02.823418image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:05.054182image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:07.268964image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:09.550783image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:12.175071image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:48.431259image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:50.631952image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:52.901022image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:55.875933image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:58.265570image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:00.681525image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:03.034218image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:05.246642image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:07.466570image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:09.785769image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:12.394471image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:48.628018image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:50.836469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:53.098357image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:56.077870image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:58.462700image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:00.893768image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:03.246143image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:05.441124image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:07.697291image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:10.000816image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:12.610580image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:48.824518image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:51.040597image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:53.308332image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:56.295523image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:58.681510image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:01.083327image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:03.462708image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:05.685386image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:07.900150image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:10.217989image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:12.829396image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:49.024552image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:51.244362image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:53.523489image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:56.513731image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:58.900542image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:01.306269image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:03.657417image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:05.882798image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:08.111377image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:10.437650image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:13.044780image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:49.222330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:51.449825image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:53.737314image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:56.732201image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:59.160738image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:01.529371image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:03.874414image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:06.068284image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:08.309826image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:10.657132image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:13.261645image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:49.415180image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:51.651503image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:53.948391image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:56.977185image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:59.381534image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:01.744721image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:04.070716image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:06.262934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:08.502455image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:10.876963image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:13.463161image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:49.609386image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:51.852213image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:54.162800image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:57.196710image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:59.597235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:01.956899image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:04.267975image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:06.463512image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:08.712711image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:11.071873image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:13.664132image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:49.810447image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:52.089979image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:54.375068image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:57.413740image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:40:59.818430image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:02.180699image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:04.474499image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:06.669730image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:08.928571image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-07-10T10:41:11.277743image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-07-10T10:41:31.685519image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Expnumberfirst60AAsExpnumberofAAsinTMHsInsideendInsidestartLengthOutsideendOutsidestartPhage_sourcePredictedTMHsNumberTMhelixendTMhelixstartTotalprobofNin
Expnumberfirst60AAs1.0000.409-0.1660.127-0.351-0.327-0.1180.0460.355-0.143-0.1630.156
ExpnumberofAAsinTMHs0.4091.0000.5120.7280.2680.3220.5840.0470.8720.6920.6720.089
Insideend-0.1660.5121.0000.7830.5160.1140.2860.0140.5300.6620.655-0.259
Insidestart0.1270.7280.7831.0000.2850.1140.2500.0120.7800.6600.655-0.190
Length-0.3510.2680.5160.2851.0000.7380.4390.0180.2600.4800.484-0.011
Outsideend-0.3270.3220.1140.1140.7381.0000.7280.0180.3060.6070.6190.240
Outsidestart-0.1180.5840.2860.2500.4390.7281.0000.0170.6200.7610.7630.288
Phage_source0.0460.0470.0140.0120.0180.0180.0171.0000.0470.0120.0120.028
PredictedTMHsNumber0.3550.8720.5300.7800.2600.3060.6200.0471.0000.6760.6780.102
TMhelixend-0.1430.6920.6620.6600.4800.6070.7610.0120.6761.0000.9940.093
TMhelixstart-0.1630.6720.6550.6550.4840.6190.7630.0120.6780.9941.0000.102
TotalprobofNin0.1560.089-0.259-0.190-0.0110.2400.2880.0280.1020.0930.1021.000

Missing values

2025-07-10T10:41:14.154777image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-10T10:41:16.329552image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
0NC_001330.1NP_039595.175122.4625222.462210.44551TrueTMHMM2.033.075.0TMHMM2.010.032.0TMHMM2.01.09.0RefSeq
1NC_001331.1NP_039601.130119.4860719.486070.86987TrueTMHMM2.01.06.0TMHMM2.07.029.0TMHMM2.030.030.0RefSeq
2NC_001331.1NP_039602.183123.014765.440760.03037NaNTMHMM2.080.083.0TMHMM2.057.079.0TMHMM2.01.056.0RefSeq
3NC_001331.1NP_039603.182243.7029025.092210.99660TrueTMHMM2.080.082.0TMHMM2.057.079.0TMHMM2.043.056.0RefSeq
4NC_001331.1NP_039604.1437237.2610419.301400.90531TrueTMHMM2.0436.0437.0TMHMM2.0418.0435.0TMHMM2.027.0417.0RefSeq
5NC_001331.1NP_039606.1424119.075140.000180.83822NaNTMHMM2.01.0228.0TMHMM2.0229.0248.0TMHMM2.0249.0424.0RefSeq
6NC_001332.1NP_039618.129122.7509722.750970.42203TrueTMHMM2.028.029.0TMHMM2.05.027.0TMHMM2.01.04.0RefSeq
7NC_001332.1NP_039619.133121.2757821.275780.26393TrueTMHMM2.026.033.0TMHMM2.04.025.0TMHMM2.01.03.0RefSeq
8NC_001332.1NP_039620.184235.5787121.530460.97217TrueTMHMM2.077.084.0TMHMM2.059.076.0TMHMM2.031.058.0RefSeq
9NC_001332.1NP_039622.1365123.102120.024990.42621NaNTMHMM2.0273.0365.0TMHMM2.0255.0272.0TMHMM2.01.0254.0RefSeq
Phage_IDProtein_IDLengthPredictedTMHsNumberExpnumberofAAsinTMHsExpnumberfirst60AAsTotalprobofNinPOSSIBLENtermInsidesourceInsidestartInsideendTMhelixsourceTMhelixstartTMhelixendOutsidesourceOutsidestartOutsideendPhage_source
2820222biochar_6172biochar_6172_1256244.3954444.395440.25832TrueTMHMM2.027.030.0TMHMM2.031.053.0TMHMM2.054.056.0STV
2820223biochar_6173biochar_6173_864245.3597444.289170.29958TrueTMHMM2.028.039.0TMHMM2.040.062.0TMHMM2.063.064.0STV
2820224biochar_6173biochar_6173_10174495.7847235.609530.69341TrueTMHMM2.0124.0135.0TMHMM2.0136.0158.0TMHMM2.0159.0174.0STV
2820225biochar_6173biochar_6173_1129120.6658120.665810.51916TrueTMHMM2.025.029.0TMHMM2.05.024.0TMHMM2.01.04.0STV
2820226biochar_6173biochar_6173_1689124.4294124.426120.84030TrueTMHMM2.01.06.0TMHMM2.07.029.0TMHMM2.030.089.0STV
2820227biochar_6175biochar_6175_652236.4902336.490230.34638TrueTMHMM2.024.029.0TMHMM2.030.051.0TMHMM2.052.052.0STV
2820228biochar_6175biochar_6175_10222122.7733122.752210.84202TrueTMHMM2.01.06.0TMHMM2.07.029.0TMHMM2.030.0222.0STV
2820229biochar_6175biochar_6175_14106119.3924319.390990.54127TrueTMHMM2.01.04.0TMHMM2.05.024.0TMHMM2.025.0106.0STV
2820230biochar_6180biochar_6180_1812121.6751621.668850.97053TrueTMHMM2.01.06.0TMHMM2.07.029.0TMHMM2.030.0812.0STV
2820231biochar_6180biochar_6180_5683119.2821917.643650.87138TrueTMHMM2.01.041.0TMHMM2.042.061.0TMHMM2.062.0683.0STV